coarse mask
- Workflow (0.46)
- Research Report (0.46)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Sensing and Signal Processing > Image Processing (0.96)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
SupplementaryforMixedSupervisedObject DetectionbyTransferringMaskPriorandSemantic Similarity
Our ablation studies (Table3in the main paper) havealready proved the advantage of mask prior. From Figure 2, we can see that the coarse masks indicate the rough locations of objects which can help the object detection network predicttheboundingboxes. Tovalidate the transferability ofour similarity transfer,we evaluate our similarity network trained on COCO-60 trainval set. Wetreat the similarity prediction task as abinary classification task, in which the binary label 1 (resp., 0) means that two bounding boxes belong to the same category (resp.,different The precision, recall and F1 scores are summarized in Table 1. We observe that the gap between the performance of similarity network on base categories and novel categories is negligible (e.g., F1 Scores 84.9% v.s.
SegRefiner: Towards Model-Agnostic Segmentation Refinement with Discrete Diffusion Process
In this paper, we explore a principal way to enhance the quality of object masks produced by different segmentation models. We propose a model-agnostic solution called SegRefiner, which offers a novel perspective on this problem by interpreting segmentation refinement as a data generation process. As a result, the refinement process can be smoothly implemented through a series of denoising diffusion steps. Specifically, SegRefiner takes coarse masks as inputs and refines them using a discrete diffusion process.
Supplementary Material for SegRefiner: Towards Model-Agnostic Segmentation Refinement with Discrete Diffusion Process Anonymous Author(s) Affiliation Address email 1 Implementation Details 1
The overall workflow of the training and inference process are provided in Alg. 1 and Alg. 2. Model Architecture Following [ 9 ], we use a U-Net with 4-channel input and 1-channel output. Both input and output resolution is set to 256 256 . Training Settings All experiments are conducted on 8 NVIDIA RTX3090 GPUs with Pytorch. After a complete reverse diffusion process, the output is resized to the original size. We apply Non-Maximum Suppression (NMS, with 0.3 as threshold) to these patches to remove Our SegRefiner can robustly correct prediction errors both outside and inside the coarse mask.
- Workflow (0.46)
- Research Report (0.46)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Sensing and Signal Processing > Image Processing (0.96)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Unsupervised Instance Segmentation with Superpixels
Instance segmentation is essential for numerous computer vision applications, including robotics, human-computer interaction, and autonomous driving. Currently, popular models bring impressive performance in instance segmentation by training with a large number of human annotations, which are costly to collect. For this reason, we present a new framework that efficiently and effectively segments objects without the need for human annotations. Firstly, a MultiCut algorithm is applied to self-supervised features for coarse mask segmentation. Then, a mask filter is employed to obtain high-quality coarse masks. To train the segmentation network, we compute a novel superpixel-guided mask loss, comprising hard loss and soft loss, with high-quality coarse masks and superpixels segmented from low-level image features. Lastly, a self-training process with a new adaptive loss is proposed to improve the quality of predicted masks. We conduct experiments on public datasets in instance segmentation and object detection to demonstrate the effectiveness of the proposed framework. The results show that the proposed framework outperforms previous state-of-the-art methods.
- Europe > Switzerland > Zürich > Zürich (0.14)
- Asia > South Korea > Seoul > Seoul (0.04)
- South America > Chile > Arica y Parinacota Region > Arica Province > Arica (0.04)
- Europe > Poland (0.04)
- Transportation > Ground > Road (0.34)
- Information Technology (0.34)
- Automobiles & Trucks (0.34)
Segment Concealed Objects with Incomplete Supervision
He, Chunming, Li, Kai, Zhang, Yachao, Yang, Ziyun, Pang, Youwei, Tang, Longxiang, Fang, Chengyu, Zhang, Yulun, Kong, Linghe, Li, Xiu, Farsiu, Sina
Incompletely-Supervised Concealed Object Segmentation (ISCOS) involves segmenting objects that seamlessly blend into their surrounding environments, utilizing incompletely annotated data, such as weak and semi-annotations, for model training. This task remains highly challenging due to (1) the limited supervision provided by the incompletely annotated training data, and (2) the difficulty of distinguishing concealed objects from the background, which arises from the intrinsic similarities in concealed scenarios. In this paper, we introduce the first unified method for ISCOS to address these challenges. To tackle the issue of incomplete supervision, we propose a unified mean-teacher framework, SEE, that leverages the vision foundation model, ``\emph{Segment Anything Model (SAM)}'', to generate pseudo-labels using coarse masks produced by the teacher model as prompts. To mitigate the effect of low-quality segmentation masks, we introduce a series of strategies for pseudo-label generation, storage, and supervision. These strategies aim to produce informative pseudo-labels, store the best pseudo-labels generated, and select the most reliable components to guide the student model, thereby ensuring robust network training. Additionally, to tackle the issue of intrinsic similarity, we design a hybrid-granularity feature grouping module that groups features at different granularities and aggregates these results. By clustering similar features, this module promotes segmentation coherence, facilitating more complete segmentation for both single-object and multiple-object images. We validate the effectiveness of our approach across multiple ISCOS tasks, and experimental results demonstrate that our method achieves state-of-the-art performance. Furthermore, SEE can serve as a plug-and-play solution, enhancing the performance of existing models.
- Asia > China > Shanghai > Shanghai (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- Asia > China > Liaoning Province > Dalian (0.04)
- (7 more...)
- Health & Medicine (1.00)
- Education (1.00)
SegRefiner: Towards Model-Agnostic Segmentation Refinement with Discrete Diffusion Process
In this paper, we explore a principal way to enhance the quality of object masks produced by different segmentation models. We propose a model-agnostic solution called SegRefiner, which offers a novel perspective on this problem by interpreting segmentation refinement as a data generation process. As a result, the refinement process can be smoothly implemented through a series of denoising diffusion steps. Specifically, SegRefiner takes coarse masks as inputs and refines them using a discrete diffusion process. To assess the effectiveness of SegRefiner, we conduct comprehensive experiments on various segmentation tasks, including semantic segmentation, instance segmentation, and dichotomous image segmentation.
AKGNet: Attribute Knowledge-Guided Unsupervised Lung-Infected Area Segmentation
Lung-infected area segmentation is crucial for assessing the severity of lung diseases. However, existing image-text multi-modal methods typically rely on labour-intensive annotations for model training, posing challenges regarding time and expertise. To address this issue, we propose a novel attribute knowledge-guided framework for unsupervised lung-infected area segmentation (AKGNet), which achieves segmentation solely based on image-text data without any mask annotation. AKGNet facilitates text attribute knowledge learning, attribute-image cross-attention fusion, and high-confidence-based pseudo-label exploration simultaneously. It can learn statistical information and capture spatial correlations between image and text attributes in the embedding space, iteratively refining the mask to enhance segmentation. Specifically, we introduce a text attribute knowledge learning module by extracting attribute knowledge and incorporating it into feature representations, enabling the model to learn statistical information and adapt to different attributes. Moreover, we devise an attribute-image cross-attention module by calculating the correlation between attributes and images in the embedding space to capture spatial dependency information, thus selectively focusing on relevant regions while filtering irrelevant areas. Finally, a self-training mask improvement process is employed by generating pseudo-labels using high-confidence predictions to iteratively enhance the mask and segmentation. Experimental results on a benchmark medical image dataset demonstrate the superior performance of our method compared to state-of-the-art segmentation techniques in unsupervised scenarios.
- North America > Canada > Ontario > National Capital Region > Ottawa (0.14)
- Europe > Finland > Pirkanmaa > Tampere (0.04)
- Asia > Middle East > Qatar (0.04)
- Health & Medicine > Diagnostic Medicine > Imaging (0.94)
- Health & Medicine > Therapeutic Area (0.90)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)